1,192 research outputs found

    Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition

    Full text link
    The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition. Separately, connectionist temporal classification (CTC) has matured as an alignment-free, non-autoregressive approach to sequence transduction, either by itself or in various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully self-attentional network for CTC, and show it is tractable and competitive for end-to-end speech recognition. SAN-CTC trains quickly and outperforms existing CTC models and most encoder-decoder models, with character error rates (CERs) of 4.7% in 1 day on WSJ eval92 and 2.8% in 1 week on LibriSpeech test-clean, with a fixed architecture and one GPU. Similar improvements hold for WERs after LM decoding. We motivate the architecture for speech, evaluate position and downsampling approaches, and explore how label alphabets (character, phoneme, subword) affect attention heads and performance.Comment: Accepted to ICASSP 201

    Conway's subprime Fibonacci sequences

    Full text link
    It's the age-old recurrence with a twist: sum the last two terms and if the result is composite, divide by its smallest prime divisor to get the next term (e.g., 0, 1, 1, 2, 3, 5, 4, 3, 7, ...). These sequences exhibit pseudo-random behaviour and generally terminate in a handful of cycles, properties reminiscent of 3x+1 and related sequences. We examine the elementary properties of these 'subprime' Fibonacci sequences.Comment: 18 pages, 5 figure

    Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition

    Full text link
    We propose a novel approach to semi-supervised automatic speech recognition (ASR). We first exploit a large amount of unlabeled audio data via representation learning, where we reconstruct a temporal slice of filterbank features from past and future context frames. The resulting deep contextualized acoustic representations (DeCoAR) are then used to train a CTC-based end-to-end ASR system using a smaller amount of labeled audio data. In our experiments, we show that systems trained on DeCoAR consistently outperform ones trained on conventional filterbank features, giving 42% and 19% relative improvement over the baseline on WSJ eval92 and LibriSpeech test-clean, respectively. Our approach can drastically reduce the amount of labeled data required; unsupervised training on LibriSpeech then supervision with 100 hours of labeled data achieves performance on par with training on all 960 hours directly. Pre-trained models and code will be released online.Comment: Accepted to ICASSP 2020 (oral

    Masked Language Model Scoring

    Full text link
    Pretrained masked language models (MLMs) require finetuning for most NLP tasks. Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are computed by masking tokens one by one. We show that PLLs outperform scores from autoregressive language models like GPT-2 in a variety of tasks. By rescoring ASR and NMT hypotheses, RoBERTa reduces an end-to-end LibriSpeech model's WER by 30% relative and adds up to +1.7 BLEU on state-of-the-art baselines for low-resource translation pairs, with further gains from domain adaptation. We attribute this success to PLL's unsupervised expression of linguistic acceptability without a left-to-right bias, greatly improving on scores from GPT-2 (+10 points on island effects, NPI licensing in BLiMP). One can finetune MLMs to give scores without masking, enabling computation in a single inference pass. In all, PLLs and their associated pseudo-perplexities (PPPLs) enable plug-and-play use of the growing number of pretrained MLMs; e.g., we use a single cross-lingual model to rescore translations in multiple languages. We release our library for language model scoring at https://github.com/awslabs/mlm-scoring.Comment: ACL 2020 camera-ready (presented July 2020

    La administración pública en los estados y reflexiones sobre el federalismo

    Get PDF
    El federalismo en México, desde su concepción orig i nal en la Constitución de 1824, ha venido transformándose y adecuándose a las nuevas exigencias de estados y municipios para pasar de una centralización a una descentralización, con la finalidad de transferir más facultades y atribuciones del gobierno cen tral hacia otros órdenes gubernamentales. A la par, han surgido organizaciones como la Conferencia Nacional de Gobernadores (Conago) para el caso de los estados, y asociaciones como la Conferencia Nacional de Municipios de México (Conamm) para los municipios, como un contrapeso real para el fortalecimiento del federalismo. A continuación analizaremos algunos aspectos de la descentralización que están contribuyendo a dicho fortalecimiento en México

    Contextual Phonetic Pretraining for End-to-end Utterance-level Language and Speaker Recognition

    Full text link
    Pretrained contextual word representations in NLP have greatly improved performance on various downstream tasks. For speech, we propose contextual frame representations that capture phonetic information at the acoustic frame level and can be used for utterance-level language, speaker, and speech recognition. These representations come from the frame-wise intermediate representations of an end-to-end, self-attentive ASR model (SAN-CTC) on spoken utterances. We first train the model on the Fisher English corpus with context-independent phoneme labels, then use its representations at inference time as features for task-specific models on the NIST LRE07 closed-set language recognition task and a Fisher speaker recognition task, giving significant improvements over the state-of-the-art on both (e.g., language EER of 4.68% on 3sec utterances, 23% relative reduction in speaker EER). Results remain competitive when using a novel dilated convolutional model for language recognition, or when ASR pretraining is done with character labels only.Comment: submitted to INTERSPEECH 201

    ESTÉTICAS DEL ENTRETENIMIENTO

    Get PDF
    Las estéticas delentretenimientoinciden en laeducación literaria, ya que adiario nos vemos seducidos porl o s d i s t i n t o s m e d i o s d ecomunicación, que a su vezi n v a d e n d e c o n s t a n t einformación nuestras mentes,saciando en nosotros mismosuna sed de visualizar nuestrasfantasías y a su vez de emplearnuestro tiempo en algo que nonos va aportar mucho a nuestro intelecto.Debido a este ocio podemos entender queesta forma mediática en nuestra sociedad yen el mundo es más común de los quepensamos, según nos cuenta el señor OmarRincón “la sociedad del entretenimiento sehizo como una forma de vida fue en losestados unidos; la música, el cine, laspelículas y algunas obras literarias las cualesfueron conceptualizadas”

    Implementación del monitoreo de aspectos ambientales para la calidad ambiental en ejecución drenaje pluvial en el distrito de Carmen Alto

    Get PDF
    En la presente investigación se realizaron antes, durante y después de la ejecución del proyecto “MEJORAMIENTO Y CREACION DEL SISTEMA DE DRENAJE PLUVIAL DE LA AV. CARMEN ALTO, AV. PERU Y JR. CANGALLO, DISTRITO DE CARMEN ALTO – HUAMANGA- AYACUCHO” la toma de muestra de monitoreo ambientales de aire y ruido, de esa manera, podemos ver cómo el proyecto de inversión pública ha afectado la calidad del aire y el ruido de fondo. La metodología que se planteado para obtener los resultados son los protocolos nacionales de calidad ambiental en muestreo de país. En donde los resultados obtenidos de los monitoreos serán evaluados según las ECAs para aire y para aire. El monitoreo ambiental realizado permitió identificar los impactos generados por el proceso de ejecución del proyecto y así poder plantear medida de control y mitigación frente a los impactos ambientales identificados. En la investigación podremos ver los resultados y teniendo una mejora continua sobre la gestión que se estaba realizando en el proyecto y así planteado medidas para controlar los resultados altos y controlar dichos impacto
    corecore